Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 21897 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.2 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 8 |
| Boolean | 3 |
Ignore7 has constant value "False" | Constant |
TxnTime has a high cardinality: 914 distinct values | High cardinality |
Amount is highly overall correlated with Ignore5 | High correlation |
Ignore3 is highly overall correlated with StoreNumber | High correlation |
Ignore5 is highly overall correlated with Amount | High correlation |
StoreNumber is highly overall correlated with Ignore3 | High correlation |
SaleFlag is highly overall correlated with Ignore6 | High correlation |
Ignore6 is highly overall correlated with SaleFlag | High correlation |
Quantity is highly overall correlated with Ignore1 | High correlation |
Ignore1 is highly overall correlated with Quantity | High correlation |
Ignore5 is highly skewed (γ1 = 30.78199035) | Skewed |
Quantity has 386 (1.8%) zeros | Zeros |
Ignore1 has 21511 (98.2%) zeros | Zeros |
Amount has 1365 (6.2%) zeros | Zeros |
Reproduction
| Analysis started | 2023-03-03 16:44:30.191368 |
|---|---|
| Analysis finished | 2023-03-03 16:44:58.737938 |
| Duration | 28.55 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
StoreNumber
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 171.2 KiB |
| 108 | |
|---|---|
| 233 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 65691 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 108 |
|---|---|
| 2nd row | 108 |
| 3rd row | 108 |
| 4th row | 108 |
| 5th row | 108 |
Common Values
| Value | Count | Frequency (%) |
| 108 | 16670 | |
| 233 | 5227 | 23.9% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 108 | 16670 | |
| 233 | 5227 | 23.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 16670 | |
| 0 | 16670 | |
| 8 | 16670 | |
| 3 | 10454 | |
| 2 | 5227 | 8.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 65691 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 16670 | |
| 0 | 16670 | |
| 8 | 16670 | |
| 3 | 10454 | |
| 2 | 5227 | 8.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 65691 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 16670 | |
| 0 | 16670 | |
| 8 | 16670 | |
| 3 | 10454 | |
| 2 | 5227 | 8.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 65691 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 16670 | |
| 0 | 16670 | |
| 8 | 16670 | |
| 3 | 10454 | |
| 2 | 5227 | 8.0% |
ItemCode
Real number (ℝ)
| Distinct | 8862 |
|---|---|
| Distinct (%) | 40.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.2321057 × 109 |
| Minimum | 83 |
|---|---|
| Maximum | 9.7829 × 1011 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | 83 |
|---|---|
| 5-th percentile | 1.3600001 × 109 |
| Q1 | 3.0400774 × 109 |
| median | 5.1500001 × 109 |
| Q3 | 7.0784459 × 109 |
| 95-th percentile | 6.1029032 × 1010 |
| Maximum | 9.7829 × 1011 |
| Range | 9.7829 × 1011 |
| Interquartile range (IQR) | 4.0383684 × 109 |
Descriptive statistics
| Standard deviation | 1.9701736 × 1010 |
|---|---|
| Coefficient of variation (CV) | 2.1340457 |
| Kurtosis | 805.75392 |
| Mean | 9.2321057 × 109 |
| Median Absolute Deviation (MAD) | 1.9284479 × 109 |
| Skewness | 18.331195 |
| Sum | 2.0215542 × 1014 |
| Variance | 3.8815839 × 1020 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1600042060 | 115 | 0.5% |
| 7078400628 | 115 | 0.5% |
| 7078400620 | 111 | 0.5% |
| 7078400624 | 95 | 0.4% |
| 7078415054 | 88 | 0.4% |
| 7078402802 | 81 | 0.4% |
| 4812110208 | 75 | 0.3% |
| 7078400500 | 75 | 0.3% |
| 7.151415035 × 1010 | 73 | 0.3% |
| 5210001005 | 71 | 0.3% |
| Other values (8852) | 20998 |
| Value | Count | Frequency (%) |
| 83 | 4 | < 0.1% |
| 450 | 1 | < 0.1% |
| 741 | 1 | < 0.1% |
| 747 | 11 | |
| 764 | 2 | < 0.1% |
| 768 | 1 | < 0.1% |
| 825 | 1 | < 0.1% |
| 833 | 1 | < 0.1% |
| 930 | 1 | < 0.1% |
| 940 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9.7829 × 1011 | 1 | |
| 9.78078 × 1011 | 1 | |
| 9.78032 × 1011 | 1 | |
| 8.99407003 × 1010 | 1 | |
| 8.989990005 × 1010 | 1 | |
| 8.98282002 × 1010 | 1 | |
| 8.97519001 × 1010 | 1 | |
| 8.97519001 × 1010 | 1 | |
| 8.97034002 × 1010 | 1 | |
| 8.96324001 × 1010 | 1 |
TxnTime
Categorical
| Distinct | 914 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 171.2 KiB |
| 09-05-2013 10:00 | 132 |
|---|---|
| 09-05-2013 17:28 | 100 |
| 09-05-2013 16:17 | 98 |
| 09-05-2013 17:37 | 97 |
| 09-05-2013 12:41 | 96 |
| Other values (909) |
Length
| Max length | 16 |
|---|---|
| Median length | 16 |
| Mean length | 16 |
| Min length | 16 |
Characters and Unicode
| Total characters | 350352 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 45 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | 09-05-2013 00:27 |
|---|---|
| 2nd row | 09-05-2013 00:27 |
| 3rd row | 09-05-2013 00:27 |
| 4th row | 09-05-2013 00:27 |
| 5th row | 09-05-2013 00:27 |
Common Values
| Value | Count | Frequency (%) |
| 09-05-2013 10:00 | 132 | 0.6% |
| 09-05-2013 17:28 | 100 | 0.5% |
| 09-05-2013 16:17 | 98 | 0.4% |
| 09-05-2013 17:37 | 97 | 0.4% |
| 09-05-2013 12:41 | 96 | 0.4% |
| 09-05-2013 17:59 | 96 | 0.4% |
| 09-05-2013 11:57 | 94 | 0.4% |
| 09-05-2013 14:21 | 93 | 0.4% |
| 09-05-2013 16:31 | 92 | 0.4% |
| 09-05-2013 15:07 | 91 | 0.4% |
| Other values (904) | 20908 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 09-05-2013 | 21897 | |
| 10:00 | 132 | 0.3% |
| 17:28 | 100 | 0.2% |
| 16:17 | 98 | 0.2% |
| 17:37 | 97 | 0.2% |
| 12:41 | 96 | 0.2% |
| 17:59 | 96 | 0.2% |
| 11:57 | 94 | 0.2% |
| 14:21 | 93 | 0.2% |
| 16:31 | 92 | 0.2% |
| Other values (905) | 20999 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 75365 | |
| 1 | 48300 | |
| - | 43794 | |
| 2 | 32467 | |
| 5 | 29673 | 8.5% |
| 3 | 29395 | 8.4% |
| 9 | 26035 | 7.4% |
| 21897 | 6.2% | |
| : | 21897 | 6.2% |
| 4 | 8362 | 2.4% |
| Other values (3) | 13167 | 3.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 262764 | |
| Dash Punctuation | 43794 | 12.5% |
| Space Separator | 21897 | 6.2% |
| Other Punctuation | 21897 | 6.2% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 75365 | |
| 1 | 48300 | |
| 2 | 32467 | |
| 5 | 29673 | 11.3% |
| 3 | 29395 | 11.2% |
| 9 | 26035 | 9.9% |
| 4 | 8362 | 3.2% |
| 7 | 4581 | 1.7% |
| 8 | 4415 | 1.7% |
| 6 | 4171 | 1.6% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 43794 |
Space Separator
| Value | Count | Frequency (%) |
| 21897 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 21897 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 350352 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 75365 | |
| 1 | 48300 | |
| - | 43794 | |
| 2 | 32467 | |
| 5 | 29673 | 8.5% |
| 3 | 29395 | 8.4% |
| 9 | 26035 | 7.4% |
| 21897 | 6.2% | |
| : | 21897 | 6.2% |
| 4 | 8362 | 2.4% |
| Other values (3) | 13167 | 3.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 350352 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 75365 | |
| 1 | 48300 | |
| - | 43794 | |
| 2 | 32467 | |
| 5 | 29673 | 8.5% |
| 3 | 29395 | 8.4% |
| 9 | 26035 | 7.4% |
| 21897 | 6.2% | |
| : | 21897 | 6.2% |
| 4 | 8362 | 2.4% |
| Other values (3) | 13167 | 3.8% |
SaleFlag
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 21.5 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 14288 | |
| False | 7609 |
| Distinct | 20 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.242088 |
| Minimum | -2 |
|---|---|
| Maximum | 21 |
| Zeros | 386 |
| Zeros (%) | 1.8% |
| Negative | 9 |
| Negative (%) | < 0.1% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | -2 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 21 |
| Range | 23 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.82540752 |
|---|---|
| Coefficient of variation (CV) | 0.66453226 |
| Kurtosis | 75.422994 |
| Mean | 1.242088 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.4751316 |
| Sum | 27198 |
| Variance | 0.68129758 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=20)
| Value | Count | Frequency (%) |
| 1 | 17781 | |
| 2 | 2841 | 13.0% |
| 3 | 387 | 1.8% |
| 0 | 386 | 1.8% |
| 4 | 295 | 1.3% |
| 5 | 65 | 0.3% |
| 6 | 60 | 0.3% |
| 10 | 25 | 0.1% |
| 8 | 20 | 0.1% |
| 9 | 8 | < 0.1% |
| Other values (10) | 29 | 0.1% |
| Value | Count | Frequency (%) |
| -2 | 1 | < 0.1% |
| -1 | 8 | < 0.1% |
| 0 | 386 | 1.8% |
| 1 | 17781 | |
| 2 | 2841 | 13.0% |
| 3 | 387 | 1.8% |
| 4 | 295 | 1.3% |
| 5 | 65 | 0.3% |
| 6 | 60 | 0.3% |
| 7 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 21 | 1 | < 0.1% |
| 20 | 1 | < 0.1% |
| 15 | 1 | < 0.1% |
| 14 | 3 | < 0.1% |
| 13 | 2 | < 0.1% |
| 12 | 5 | < 0.1% |
| 11 | 1 | < 0.1% |
| 10 | 25 | |
| 9 | 8 | < 0.1% |
| 8 | 20 |
| Distinct | 349 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.016239371 |
| Minimum | 0 |
|---|---|
| Maximum | 6.951 |
| Zeros | 21511 |
| Zeros (%) | 98.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 6.951 |
| Range | 6.951 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.16718086 |
|---|---|
| Coefficient of variation (CV) | 10.294787 |
| Kurtosis | 406.57853 |
| Mean | 0.016239371 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 17.096405 |
| Sum | 355.5935 |
| Variance | 0.02794944 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 21511 | |
| 1.02 | 4 | < 0.1% |
| 1 | 3 | < 0.1% |
| 0.5896 | 3 | < 0.1% |
| 1.01 | 3 | < 0.1% |
| 0.7995 | 3 | < 0.1% |
| 0.3407 | 3 | < 0.1% |
| 0.8704 | 2 | < 0.1% |
| 0.8195 | 2 | < 0.1% |
| 0.5701 | 2 | < 0.1% |
| Other values (339) | 361 | 1.6% |
| Value | Count | Frequency (%) |
| 0 | 21511 | |
| 0.01 | 2 | < 0.1% |
| 0.0102 | 1 | < 0.1% |
| 0.0108 | 1 | < 0.1% |
| 0.0112 | 1 | < 0.1% |
| 0.0116 | 1 | < 0.1% |
| 0.0171 | 1 | < 0.1% |
| 0.0181 | 1 | < 0.1% |
| 0.0186 | 1 | < 0.1% |
| 0.0206 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 6.951 | 1 | |
| 5 | 1 | |
| 4.9499 | 1 | |
| 4.8 | 1 | |
| 4.2016 | 1 | |
| 3.9199 | 1 | |
| 3.9117 | 1 | |
| 3.8498 | 1 | |
| 3.7979 | 1 | |
| 3.6291 | 1 |
| Distinct | 676 |
|---|---|
| Distinct (%) | 3.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.2578413 |
| Minimum | -49.99 |
|---|---|
| Maximum | 174.99 |
| Zeros | 1365 |
| Zeros (%) | 6.2% |
| Negative | 151 |
| Negative (%) | 0.7% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | -49.99 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1.79 |
| median | 2.64 |
| Q3 | 3.99 |
| 95-th percentile | 7.99 |
| Maximum | 174.99 |
| Range | 224.98 |
| Interquartile range (IQR) | 2.2 |
Descriptive statistics
| Standard deviation | 3.2872447 |
|---|---|
| Coefficient of variation (CV) | 1.0090254 |
| Kurtosis | 369.29381 |
| Mean | 3.2578413 |
| Median Absolute Deviation (MAD) | 1.14 |
| Skewness | 9.357096 |
| Sum | 71336.95 |
| Variance | 10.805978 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 1789 | 8.2% |
| 2.5 | 1411 | 6.4% |
| 0 | 1365 | 6.2% |
| 1 | 1163 | 5.3% |
| 2.99 | 1072 | 4.9% |
| 3 | 970 | 4.4% |
| 3.99 | 869 | 4.0% |
| 4.99 | 478 | 2.2% |
| 2.49 | 437 | 2.0% |
| 3.49 | 424 | 1.9% |
| Other values (666) | 11919 |
| Value | Count | Frequency (%) |
| -49.99 | 1 | < 0.1% |
| -12.98 | 1 | < 0.1% |
| -8 | 6 | < 0.1% |
| -7.68 | 1 | < 0.1% |
| -7.5 | 10 | < 0.1% |
| -7.31 | 1 | < 0.1% |
| -7.21 | 1 | < 0.1% |
| -7 | 1 | < 0.1% |
| -6.49 | 48 | |
| -6.21 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 174.99 | 1 | |
| 65.96 | 1 | |
| 63 | 1 | |
| 49.99 | 1 | |
| 44.99 | 1 | |
| 40.47 | 1 | |
| 38.46 | 1 | |
| 37.47 | 1 | |
| 35.97 | 2 | |
| 34.99 | 1 |
Ignore3
Real number (ℝ)
| Distinct | 2862 |
|---|---|
| Distinct (%) | 13.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 243943.9 |
| Minimum | 20003 |
|---|---|
| Maximum | 870123 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | 20003 |
|---|---|
| 5-th percentile | 20055 |
| Q1 | 40054 |
| median | 130039 |
| Q3 | 160033 |
| 95-th percentile | 860059 |
| Maximum | 870123 |
| Range | 850120 |
| Interquartile range (IQR) | 119979 |
Descriptive statistics
| Standard deviation | 316861.96 |
|---|---|
| Coefficient of variation (CV) | 1.2989132 |
| Kurtosis | -0.2049935 |
| Mean | 243943.9 |
| Median Absolute Deviation (MAD) | 89945 |
| Skewness | 1.2907514 |
| Sum | 5.3416396 × 109 |
| Variance | 1.004015 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 130039 | 85 | 0.4% |
| 160019 | 81 | 0.4% |
| 40171 | 76 | 0.3% |
| 140070 | 69 | 0.3% |
| 20005 | 69 | 0.3% |
| 30045 | 67 | 0.3% |
| 20030 | 64 | 0.3% |
| 150015 | 64 | 0.3% |
| 40060 | 63 | 0.3% |
| 40007 | 62 | 0.3% |
| Other values (2852) | 21197 |
| Value | Count | Frequency (%) |
| 20003 | 2 | < 0.1% |
| 20004 | 3 | < 0.1% |
| 20005 | 69 | |
| 20006 | 23 | 0.1% |
| 20007 | 13 | 0.1% |
| 20008 | 13 | 0.1% |
| 20009 | 3 | < 0.1% |
| 20010 | 24 | 0.1% |
| 20011 | 23 | 0.1% |
| 20012 | 10 | < 0.1% |
| Value | Count | Frequency (%) |
| 870123 | 1 | < 0.1% |
| 870122 | 2 | < 0.1% |
| 870121 | 7 | < 0.1% |
| 870120 | 27 | |
| 870119 | 8 | < 0.1% |
| 870118 | 10 | < 0.1% |
| 870117 | 1 | < 0.1% |
| 870116 | 20 | |
| 870115 | 2 | < 0.1% |
| 870114 | 1 | < 0.1% |
Ignore4
Real number (ℝ)
| Distinct | 8862 |
|---|---|
| Distinct (%) | 40.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 229083 |
| Minimum | 3933 |
|---|---|
| Maximum | 924909 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | 3933 |
|---|---|
| 5-th percentile | 12747.6 |
| Q1 | 18222 |
| median | 34329 |
| Q3 | 415483 |
| 95-th percentile | 897163 |
| Maximum | 924909 |
| Range | 920976 |
| Interquartile range (IQR) | 397261 |
Descriptive statistics
| Standard deviation | 343459.55 |
|---|---|
| Coefficient of variation (CV) | 1.49928 |
| Kurtosis | -0.37951513 |
| Mean | 229083 |
| Median Absolute Deviation (MAD) | 18433 |
| Skewness | 1.2233797 |
| Sum | 5.0162305 × 109 |
| Variance | 1.1796447 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 18150 | 115 | 0.5% |
| 34329 | 115 | 0.5% |
| 34331 | 111 | 0.5% |
| 34330 | 95 | 0.4% |
| 28730 | 88 | 0.4% |
| 49882 | 81 | 0.4% |
| 35233 | 75 | 0.3% |
| 14896 | 75 | 0.3% |
| 49891 | 73 | 0.3% |
| 25555 | 71 | 0.3% |
| Other values (8852) | 20998 |
| Value | Count | Frequency (%) |
| 3933 | 2 | |
| 3985 | 1 | |
| 4087 | 2 | |
| 4509 | 1 | |
| 4610 | 2 | |
| 4796 | 1 | |
| 4904 | 1 | |
| 4905 | 1 | |
| 4936 | 1 | |
| 4946 | 1 |
| Value | Count | Frequency (%) |
| 924909 | 1 | < 0.1% |
| 923866 | 1 | < 0.1% |
| 923539 | 1 | < 0.1% |
| 923393 | 1 | < 0.1% |
| 923392 | 2 | |
| 923391 | 1 | < 0.1% |
| 923357 | 4 | |
| 923355 | 1 | < 0.1% |
| 923349 | 1 | < 0.1% |
| 923342 | 2 |
| Distinct | 636 |
|---|---|
| Distinct (%) | 2.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.0504667 |
| Minimum | -49.99 |
|---|---|
| Maximum | 349.99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 9 |
| Negative (%) | < 0.1% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | -49.99 |
|---|---|
| 5-th percentile | 1.19 |
| Q1 | 2.27 |
| median | 3.29 |
| Q3 | 4.49 |
| 95-th percentile | 9.164 |
| Maximum | 349.99 |
| Range | 399.98 |
| Interquartile range (IQR) | 2.22 |
Descriptive statistics
| Standard deviation | 4.0523761 |
|---|---|
| Coefficient of variation (CV) | 1.0004714 |
| Kurtosis | 2445.2191 |
| Mean | 4.0504667 |
| Median Absolute Deviation (MAD) | 1.1 |
| Skewness | 30.78199 |
| Sum | 88693.07 |
| Variance | 16.421752 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.99 | 1185 | 5.4% |
| 2.99 | 1098 | 5.0% |
| 2.69 | 976 | 4.5% |
| 1.99 | 875 | 4.0% |
| 2.19 | 845 | 3.9% |
| 3.49 | 815 | 3.7% |
| 2.79 | 757 | 3.5% |
| 3.69 | 680 | 3.1% |
| 2.49 | 647 | 3.0% |
| 3.79 | 612 | 2.8% |
| Other values (626) | 13407 |
| Value | Count | Frequency (%) |
| -49.99 | 1 | < 0.1% |
| -12.98 | 1 | < 0.1% |
| -4.99 | 3 | |
| -2.99 | 1 | < 0.1% |
| -2.5 | 1 | < 0.1% |
| -1.39 | 2 | < 0.1% |
| 0.02 | 5 | |
| 0.03 | 2 | < 0.1% |
| 0.04 | 2 | < 0.1% |
| 0.05 | 3 |
| Value | Count | Frequency (%) |
| 349.99 | 1 | |
| 73.96 | 1 | |
| 69.09 | 1 | |
| 54.4 | 1 | |
| 53.37 | 1 | |
| 49.99 | 2 | |
| 44.97 | 2 | |
| 43.96 | 1 | |
| 41.98 | 2 | |
| 41.97 | 1 |
Ignore6
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 21.5 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 14182 | |
| False | 7715 |
Ignore7
Boolean
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 21.5 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 21897 |
SegmentCode
Real number (ℝ)
| Distinct | 1166 |
|---|---|
| Distinct (%) | 5.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1553.5355 |
| Minimum | 3 |
|---|---|
| Maximum | 4495 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 171.2 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 95 |
| Q1 | 562 |
| median | 919 |
| Q3 | 2798 |
| 95-th percentile | 3468 |
| Maximum | 4495 |
| Range | 4492 |
| Interquartile range (IQR) | 2236 |
Descriptive statistics
| Standard deviation | 1246.2879 |
|---|---|
| Coefficient of variation (CV) | 0.80222685 |
| Kurtosis | -1.3416954 |
| Mean | 1553.5355 |
| Median Absolute Deviation (MAD) | 631 |
| Skewness | 0.50585432 |
| Sum | 34017767 |
| Variance | 1553233.5 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 772 | 325 | 1.5% |
| 723 | 303 | 1.4% |
| 484 | 236 | 1.1% |
| 778 | 232 | 1.1% |
| 2736 | 222 | 1.0% |
| 459 | 221 | 1.0% |
| 129 | 213 | 1.0% |
| 4478 | 191 | 0.9% |
| 2798 | 188 | 0.9% |
| 749 | 188 | 0.9% |
| Other values (1156) | 19578 |
| Value | Count | Frequency (%) |
| 3 | 48 | |
| 4 | 1 | < 0.1% |
| 5 | 28 | |
| 6 | 4 | < 0.1% |
| 7 | 36 | |
| 8 | 12 | 0.1% |
| 10 | 2 | < 0.1% |
| 11 | 7 | < 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 4495 | 2 | < 0.1% |
| 4493 | 1 | < 0.1% |
| 4492 | 5 | < 0.1% |
| 4490 | 4 | < 0.1% |
| 4485 | 2 | < 0.1% |
| 4480 | 2 | < 0.1% |
| 4479 | 3 | < 0.1% |
| 4478 | 191 | |
| 3652 | 1 | < 0.1% |
| 3651 | 5 | < 0.1% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| StoreNumber | ItemCode | TxnTime | SaleFlag | Quantity | Ignore1 | Amount | Ignore3 | Ignore4 | Ignore5 | Ignore6 | Ignore7 | SegmentCode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 108 | 8.410580e+10 | 09-05-2013 00:27 | N | 1 | 0.0 | 11.79 | 840200 | 896262 | 11.79 | N | N | 1002 |
| 1 | 108 | 8.947000e+10 | 09-05-2013 00:27 | N | 1 | 0.0 | 1.29 | 840200 | 832102 | 1.29 | N | N | 778 |
| 2 | 108 | 2.840016e+09 | 09-05-2013 00:27 | Y | 2 | 0.0 | 7.00 | 840200 | 893943 | 8.58 | Y | N | 1071 |
| 3 | 108 | 2.840016e+09 | 09-05-2013 00:27 | Y | 2 | 0.0 | 6.00 | 840200 | 893952 | 6.98 | Y | N | 1053 |
| 4 | 108 | 4.127102e+09 | 09-05-2013 00:27 | N | 1 | 0.0 | 3.69 | 840200 | 894107 | 3.69 | N | N | 3541 |
| 5 | 108 | 2.363742e+09 | 09-05-2013 00:27 | N | 1 | 0.0 | 3.99 | 840200 | 918580 | 3.99 | N | N | 3418 |
| 6 | 108 | 8.182900e+10 | 09-05-2013 00:27 | N | 2 | 0.0 | 2.58 | 840200 | 917619 | 2.58 | N | N | 775 |
| 7 | 108 | 8.182900e+10 | 09-05-2013 00:27 | N | 1 | 0.0 | 1.29 | 840200 | 917620 | 1.29 | N | N | 778 |
| 8 | 108 | 8.182900e+10 | 09-05-2013 00:27 | N | 2 | 0.0 | 2.58 | 840200 | 917621 | 2.58 | N | N | 778 |
| 9 | 108 | 2.100001e+09 | 09-05-2013 00:27 | N | 2 | 0.0 | 2.38 | 840200 | 23795 | 2.38 | N | N | 2751 |
| StoreNumber | ItemCode | TxnTime | SaleFlag | Quantity | Ignore1 | Amount | Ignore3 | Ignore4 | Ignore5 | Ignore6 | Ignore7 | SegmentCode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21887 | 233 | 7.880011e+09 | 09-05-2013 23:42 | Y | 1 | 0.0 | 1.00 | 80058 | 55514 | 1.49 | Y | N | 754 |
| 21888 | 233 | 7.078400e+09 | 09-05-2013 23:42 | Y | 1 | 0.0 | 2.99 | 80058 | 444365 | 3.49 | Y | N | 3163 |
| 21889 | 233 | 4.900005e+09 | 09-05-2013 23:42 | Y | 1 | 0.0 | 1.00 | 80058 | 36299 | 1.19 | Y | N | 1307 |
| 21890 | 233 | 3.656328e+10 | 09-05-2013 23:50 | Y | 1 | 0.0 | 0.88 | 80059 | 18800 | 1.59 | Y | N | 906 |
| 21891 | 233 | 1.200011e+09 | 09-05-2013 23:52 | Y | 2 | 0.0 | 3.00 | 80060 | 897187 | 5.58 | Y | N | 1296 |
| 21892 | 233 | 6.337120e+10 | 09-05-2013 23:52 | N | 1 | 0.0 | 7.99 | 80060 | 844339 | 7.99 | N | N | 2410 |
| 21893 | 233 | 6.166761e+10 | 09-05-2013 23:54 | Y | 1 | 0.0 | 2.50 | 80061 | 20311 | 2.99 | Y | N | 3248 |
| 21894 | 233 | 7.084700e+09 | 09-05-2013 23:54 | Y | 1 | 0.0 | 1.75 | 80061 | 591019 | 2.49 | Y | N | 1296 |
| 21895 | 233 | 1.708200e+09 | 09-05-2013 23:54 | N | 1 | 0.0 | 1.99 | 80061 | 53401 | 1.99 | N | N | 1076 |
| 21896 | 233 | 7.148102e+09 | 09-05-2013 23:55 | N | 1 | 0.0 | 3.79 | 80062 | 15076 | 3.79 | N | N | 734 |